Skip to main content

Custom Content Policies

Overview

Content policies can be used to enforce custom rules to identify and block content that violates organizational policies or guidelines. Examples of custom policies include "No Financial Advice", "Prohibit Coercive Language", or "Prohibit Client Account Numbers." Custom Input Content policies can be applied to moderate user-provided inputs and Custom Output Content policies can be applied to moderate AI model responses.

Custom Content Policy Actions

You can manage what happens to inputs and outputs when applying content policies using the actions below:

  • Flag: flag content for moderator review
  • Block: block user inputs or model outputs containing content violating policy

Custom Content Policy Types

Custom Content includes two policy types:

  • Behavioral Content Policy: Detects requests, intent, or conversational behavior that violates your policy.
  • Data Classification: Detects the actual presence of specific sensitive content or data structures in text.

Choose Behavioral Content when you want to stop the request itself, such as "What is John's SSN?". Choose Data Classification when you want to detect the content itself, such as an actual SSN, payment card number, credential, or organization-specific identifier appearing in the text. Today, Data Classification is created as an input Custom Content policy.